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Introduction 


¢ What is Linux 
¢ Development Framework 
e Design Issues 

¢ Current Design 

¢ Development Cycle 


¢ Work in Progress 
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What is Linux 


UNIX clone for PC clones 
Source compatible with UNIX 
Full UNIX semantics 


e networking 

¢ virtual memory 

¢ crash protection 

¢« multiple filesystems 


FREE 
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Development Framework 


Open development tree 


¢ Fast feedback 


Modularized development 


Ridiculous number of different devices 
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Design Issues 


KISS 
Compatibility 
¢ Don't rewrite all the programs too 


Performance 


Future expansion 
¢ Avoid things that lock us to a certain design 
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Current Design 


¢ Monolithic 
e Easier to handle 
e But: loadable modules 
¢ Non-preemptible 
¢ Avoid locking 
e Race-conditions manageable 
e Machine specific: device drivers 


e 50%+ of the code is device drivers 
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Development Cycle 


e Major releases 

¢ 1.0.0, eventually 2.0.0 
e “User” kernels 

¢ No actual development 


e Stable 
e 1.0.x, 1.2.x, ... 


e “Hacker” kernels 


¢ All development goes here 
e May or may not work for you.. 
e 1.1.x, 1.3.x, ... 
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Work in Progress 


Kernel threads 


e Memory management 


e¢ Filesystem 


e name and data caching 


Loadable modules 
e Porting 
¢ Alpha, MIPS, MC680x0, PowerPC, Sparc 
e Binary compatibility 
e DOS, iIBCS2, Windows 
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Kernel 





Coding kernel vs user level 
Data Structures 

Kernel Traps 

Signal Handling 
Scheduling 
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Kernel vs User level 





¢ Inherently multi-threaded 
¢ Race Conditions 
e Deadlocks 
¢ Security 
¢ Parameter checking 
¢ You don’t get errors in kernel mode... 
¢ Stack and memory allocation: 
e Limited resource 
¢ Non-swappable 
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Per-process Data Structures 


e Kernel stack 
e User memory space 
e “current”: process descriptor 


e per-thread info 
e memory management information 
e filesystem state information 


e open file information 
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Trapping into the kernel 


¢ system calls 
e “int 0x80” 
¢ Arguments in registers 


e interrupts 


¢ faults 
e Benign: 
¢ Page fault 
¢ coprocessor missing fault 
e¢ Malign: 


« Protection faults 
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Signal handling 


Internally two per-process bit-fields 


unsigned long signal; 


unsigned long masked; 
... but UNIX semantics complicated 
e system call restarting 
Code that expects to be interrupted 


interruptible sleep on(); 


ch (current->signal & ~current->masked) 


Return to user mode triggers signal handling 


Australte September 1994 


Linux Tutorial 


Scheduling 


e Problems 
¢ interactive response without over-scheduling 
¢ multiple events, race conditions 

¢ Weighted process priorities 
e No run-queue 

| ¢ No absolute priorities 

¢ Re-scheduling 

¢ on return to user mode 

¢ explicit call to “schedule ()” 

. implicit blocking operations 
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scheduling 


e Wait lists 


add_wait_queue(wait_entry, wait list); 


remove_wait _queue(wait_entry, wait list); 
e Sleeping 


current->state = TASK INTERRUPTIBLE; 
schedule(); 


e Wakup 


wakeup_interruptible(wait-list) ; 
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Filesystems 


VFS layer 
Minix FS © 
Ext2 FS 

NFS Filesystem 
proc filesystem 


September 1994 


VFS Layer 


¢ Virtualizes the filesystem interface 
¢ independent filesystems 
¢ give the user a unified filesystem layout 
¢ Handles basic filesystem operations 


e mount points and caching 


¢ filesystem independent operations 


¢ No real native filesystem; instead — 


ext2, ext, xia, minix, nfs, sysv, OS/2, msdos, iso9660 and 
proc filesystems 
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VFS Function Switch 


¢ Superblock operations: 


read_inode(), notify change(), write inode(), 
put_inode(), put_super(), write_super(), 
statfs(), remount fs() 


e Inode operations: 


create(), lookup(), link(), unlink(), 
symlink(), mkdir(), rmdir(), mknod(), 
rename(), readlink(), follow_link(), bmap(), 
truncate(), permission() | 


¢ File operations: 


lseek(), read(), write(), readdir(), select(), 
1octl(), mmap(), open(), release(), fsync(), 
fasync(), check_media change(), revalidate() 
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Buffer Cache 


e Linked lists of blocks 
¢ free buffers 
e shared buffers 
¢ locked buffers 
¢ dirty buffers 


¢ used buffers 
¢ Critical resource 
¢ Hash-list for quick lookup 


¢ Avoid locking © 
« beware of races 
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Buffer structure 


struct buffer head { 
char *b data; 
unsigned long b_ size; 
unsigned long b blocknr; 
dev_t b dev; 
unsigned short b count; 
unsigned char b_ uptodate; 
unsigned char b dirt; 
unsigned char b lock; 
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Buffers 


hashes 


free: || > | | > | > | 
a, 
shared: | | = | | ad | | 
locked: = [| VY 
dirty: | | > | | 
ol 
in-use: | | 
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Inode Cache 


¢ Keeps track of inodes 
e VFS layer general information 
¢ pointers to low-level actor functions 


¢ union of per-filesystem information 
e Simple LRU-replacement 


¢ Hash list for lookups 
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Inode Structure 


struct inode { 
dev_t i dev; 
unsigned long i_ino; 
mode _t i mode; 
unsigned short i count; 
unsigned short i flags; 
union u { 


filesystem-specific data; 
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Inodes 


5 adlie's MI 6 TAM] 


= \ 


LRU list: > C4 te herd >| |»>| | 
inne 


September 1994 


Linux Tutorial 





Name Cache 


¢ Speeds up the “lookup ()” operation 


Filesystem independent operations 
e “dcache_add()” 
e “dcache lookup()” 


2-level LRU-list 


¢ first level filled by new entries ... 


e ... move to second level at cache hit 


Unified hash list for lookups 


e Negative caching 
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Name Cache 


‘hashes 
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Minix FS 


e Filesystem Layout 


e Limitations 
e 16-bit inode and block numbers 


¢ 14-character fixed size directory entries 


¢ Extensions 
¢ symlinks 


e 30-character directory entry choice 
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Ext2 FS 


e Filesystem Layout 


“EEE 7 — 


¢ Extended semantics ona pertile basis 
¢ Immutable files and safe deletions 
¢ Synchronous filesystem updates 
_¢ Extendible (access control lists etc) 
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NFS filesystem 


¢ Translates the low-level actor calls directly 
into NFS requests 


e Performance problems 


¢ Does currently only very limited name caching 
¢ Read-caching hard due to buffer cache setup 
¢ Read-ahead not implemented (yet) 


e NFS server done in user level 


¢ Security problems with the current implementation 
¢ Single-threaded: performance problems 
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/proc FS 


¢ Virtual filesystem as in Plan-9 
e NOT the limited SysV version 
¢ designed to make system binaries version-independent 


e Files: 


e in/proc: loadavg, uptime, meminfo, kmsg, 
version, kcore, modules, stat, devices, 
filesystems, ksyms, irg, dma 


¢ in/proc/net: unix, arp, route, dev, raw, tcp, 
udp, snmp, ... 


e¢ in/proc/<pid>: mem, cwd, root, exe, fd, environ, 
cmdline, stat, statm, maps 
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Strategy 
Implementation 
Buffer Cache 
Page Sharing 


Dynamic Allocation 


Memory management 
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Strategy 


¢ Fundamental resource 
- Kernel internal data © 
¢ User process memory 
¢ 10 buffering 
¢ shared memory 


¢ Maximize memory use 
¢ minimize free memory 
¢ maximize memory re-use 
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Implementation 


e The basic allocation unit is the page 


¢ larger contiguous areas possible, but not guaranteed 
¢ page memory management with a buddy system 





e in-use bitmaps for each logarithmic size 


e linked list of free blocks 
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User memory 


¢ Demand-allocated 
e demand-loading of executables 


¢ allocation of user memory on use 


e Shared 


- extensive sharing of code and data pages with C-O-W 
¢ with same executable 
¢ with same shared library 
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- Buffer Cache 


e Dynamic 
¢ shrinks on demand 
¢ grows when there is free memory 
¢ adapts naturally to different circumstances 


¢ Shares pages with user process memory 


e¢ Buffers naturally aligned 


¢ Sharing not forced, but made easy for the filesystem 
code 


¢ Normal C-O-W behaviour for private mappings 
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Dynamic Allocation 





¢ kmalloc() 
e small areas, occasional use 


¢ major internal data structures do their own page-level 
allocation and partitioning 


¢ vmalloc() 
¢ multiple pages 
* virtual memory in kernel space 
e Non-swappable 
e note: DMA 
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Device Drivers 


Basics 

Character Devices 
Block Devices _ 

SCSI Devices 

Floating point emulation 


Sound driver 
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Interrupts 


e fast interrupts 

¢ For low-level critical handlers (serial lines) 

e run with all interrupts disabled and minimal state saving 
e normal interrupts | 


¢ For slower hardware interrupts 


e runs with other interrupts enabled and normal kerne! 
stack 


-@ bottom half handlers 


e For non-timing-critical work that is slow 
¢ runs with all interrupts enabled 
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Interrupts, part 2 


e Timers 
¢ Special case of bottom half handler 
¢ Limited accuracy (100Hz system clock) 


¢ Timeouts and slow operations without interrupts (eg 
floppy motor timing) 


¢ Polling 
¢ With known fast operations 


e When all else fails.. 
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DMA 


¢ DMA allocation 
request_dma(nr, “device”); 
enable dma /disable_dma(nr) ; 
free dma(nr) ; 
* virtual <-> physical addresses 
¢ kernel address Space 1:1 mapping 
° ... except for memory allocated with. vmalloc() 


¢ ISA 16MB DMA limit 


¢ bounce buffers required for DMA drivers 
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IO Ports 


¢ 10 port allocation 
check_region(start, size); 


snarf region(start, size); 


e IO port usage 


inb(port), inb p(port) 
outb(value, port), outb_p(value, port) 
insb(port, address, count) 


outsb(port, address, count) 
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Probing 


¢ PC hardware not designed for autoprobing 
¢ there be dragons here.. 


¢ But using defaults or counting on the user knowing what 
he is doing is even less productive 


¢ Do (in order of preference) 
¢ check BIOS signatures 
* read memory-mapped IO 
e read IO ports 
¢ write IO ports 
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Character Devices 





e tty driver 
e virtual consoles 
e serial 


e pseudo tty 
¢ memory driver 


e mem, kmem, port 


e null, zero, full 
e misc 


¢ line printer, mice, tapes 
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Example 


Static int hello_read(struct inode * inode, struct file * file, 
char * buf, int size) | 


int 1 = MIN(size, 12); 


memcpy_tofs(buf, “Hello World\n”, i) 


return i; 


Static struct file operations hello_fops = { 
NULL, hello_read, 
ri 


register_chrdev(HELLO_MAJOR, “hello”, hello_fops); 
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Block Devices 


¢ Basic operation: filling requests 
¢ Only have to know how to fill in a given area 
¢ ... but optimizations usually needed to make it fly 
¢ ... and error handling makes the drivers large 


¢ Raw devices not normally supported: go for the default 
case and optimize it for cached accesses 


e Current drivers: 


¢ standard harddisk, XT harddisk, floppy, ramdisk and 
Mitsumi, Sony CDU31a and SBP CD-ROM 


Austratlz September 1994 


Linux Tutorial 


SCSI Devices 


¢ Common high-level code 


¢ Low-level driver needs to just support a few 
basic operations 


¢ detect(), info(), bios_param(), queuecommand(), done() 
callback, command() , abort(), reset() 
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Floating point emulator 


Pseudo-driver 
Emulates a 486, but 


¢ self-modifying code 
¢ precision 
e speed 
Note that a 387 does some of this even worse 


Useful for 
¢ no actual FPU hardware 


¢ co-processor bugs 
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_FPU emulator, part 2 


e problems -> details: 

e 32-bit kernel, but 
¢ 16-bit segment handling for Wine 
¢ 16-bit pseudo-segments for DOSEMU 
¢ 16-bit operations 

¢ Impacts performance directly: 
¢ assembly language 

¢ the actual math 
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Sound Driver 





AdLib 
SoundBlaster family 


¢ + compatibles 
ProAudioSpectrum16, ProAudioStudio16.... 
Gravis Ultrasound 
MPU-401 


-September 1994 





Linux Tutorial 


Further info 





¢ Newsgroups: 
¢ comp.os.linux.* 
¢ announce, development, admin, help, misc 
¢ Mailing lists: 
¢ linux-activists-request@ niksula.hut.fi 
e FTP sites: 
¢ tsx-11.mit.edu: pub/linux 


¢ sunsite.unc.edu: pub/Linux 
¢ ftp.funet.fi: pub/OS/Linux 
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